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Relative Precision of Ability Estimations in Polytomous CAT: 

A Comparison under the Generalized Partial Credit Model and Graded Response Model 

Abstract 

The purpose of this Monte Carlo (MC) study was to evaluate the relative accuracy of 
Warm’s weighted likelihood estimate (WLE) compared to maximum likelihood estimate (MLE), 
expected a posteriori estimate (EAP), and maximum a posteriori estimate (MAP), using the 
generalized partial credit model (GPCM) and graded response model (GRM) under a variety of 
computerized adaptive testing conditions. In general, for all four 0 estimation methods, 
conditional and overall bias, standard error (SE), and root mean square error (RMSE) decreased 
as test length, test reliability, and item bank size increased. The magnitudes of the differences 
among the dependent variables decreased as the values of independent variables increased. For 
both models, WLE outperformed MLE in terms of all the dependent variables studied, and WLE 
performed better than the Bayesian methods in terms of bias. MLE had less bias than both 
Bayesian methods. In general, for the fixed test length, both the GPCM and GRM models, 
estimation method and test length had some impact on bias, SE, and RMSE. But, the model 
factor had the greatest impact on RMSE, accounting for 31.2% of the total variance of RMSE 
under the GRM. For the fixed test reliability, practically, the model factor had almost no 
influence on bias, SE, and RMSE under GRM. 

Index terms: computerized adaptive testing, ability estimation methods, polytomous responses, 
item response theory. 
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Introduction 

Computerized adaptive testing (CAT) using dichotomously scored item response models, 
such as Rasch or 1-PL, 2-PL, and 3-PL logistic models, are now found in many high-stakes 
educational and professional assessment programs. However, in practice, there are few CAT 
applications that have been based on items with the more "nature" format of using polytomous 
models, such as Samejima’s (1969) graded response model (GRM), Muraki’s (1992) generalized 
partial credit model (GPCM), Master’s (1982) partial credit model, Bock’s (1972) normal model, 
Andrich’s (1978) rating scale model, et al. In some situations, given the richer and more realistic 
form of assessment of polytomously scored items compared to that of dichotomously scored 
items, the CAT with polytomously scored items could be a more valid and reasonable choice. In 
general, advantages of a polytomous model are: (a) the amount of item information provided by 
a polytomously scored item is greater than that from a dichotomously scored item (Baker, 1992; 
Bock, 1972; Sympson, 1983; Thissen & Steinberg, 1984, Samejima, 1969); (b) the rate of 
detecting mismeasured examinees using a polytomously scored item is greater than it is when 
using a dichotomously scored item. However, polytomous CATs are not widely used in 
educational and professional testing settings because machine scoring of polytomous items is 
still difficult to achieve. Recently, researches (Kukich, 2(X)0; Yong, Buckendahl, Juszkiewicz, & 
Bhola, 2002) in computer scoring of open-end format items has shown new hope for the 
polytmous item-based CAT. 

In CAT, an examinees ability is estimated after each item response is given. The ability 
estimates not only affect the final outcome of testing, but also determine which item is to be 
selected at each CAT stage. Four IRT -based ability estimates have been popular in CAT 
research and applications in the past: (a) Warm’s weighted likelihood estimate (WLE), (b) 
maximum likelihood estimate (MLE), (c) expected a posterior estimate (EAP), and (d) maximum 

a posterior estimate (MAP). Previous studies (Bock & Mislevy, 1982; Wang & Vispoel, 1998; 
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Weiss & McBride, 1984; Wang, 1995; Wang, Hanson & Lau, 1999; Wang, 1999; Wang & 
Wang, 2001) have shown that the Bayesian methods, such as EAP and MAP, are severely biased 
toward the mean of the prior distribution and are thus unacceptable to many standardized testing 
programs. MLE was found to have smaller bias in the opposite direction to that of the Bayesian 
methods, (i.e., low ability examinees are negatively biased and high ability examinees are 
positively biased), but have a notably larger standard error (SE) than the Bayesian methods. 
Warm (1989) found that for 2- and 3-parameter IRT models, WLE was less biased than either 
MLE or the Bayesian methods. Wang and Wang (2001) showed that for Muraki’s (1992) 
generalized partial credit mode (GPCM), WLE has better precision than MLE when the GPCM 
for fixed test length CAT was used in the CAT environment. It was also found that WLE and 
MLE have smaller bias but larger SE than both EAP and MAP, which is consistent with the 
previous finding. Samejima (1998) adopted Warm’s approach, expanded it to the polj1;omous 
models, and formulated it with the graded response model (GRM). Wang, Hanson and Lau 
(1999) and Wang & Wang (2001) demonstrated that Warm and Samejima’s approach is a special 
case of a general approach proposed by Firth (1993) which has a more rigorous theoretical basis. 

The GPCM and GRM models are the two most commonly used IRT models for 
poljliomously scored items. Both models have item discrimination parameters, but GRM is a 
‘difference model’ and the GPCM is a ‘divide-by-total model’ (Thissen & Steinberg, 1986). The 
two models differ in that, with GPCM, the value of the item category parameters are not 
necessarily in successive order as are those of the graded response model. 

A few studies have examined the relative precision of those four ability estimation 
methods using different polytomous IRT models (Gorin, Dodd, Fitzpatrick, & Shieh, 2000; 
Wang, 1999; Wang & Wang, 2001). In particular, Wang and Wang (2001) systematically 
compared all four estimation methods under the GPCM model. However, no study has 
systematically compared the four ability estimation methods under the GRM and no study has 
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made the comparison between the GRM and GPCM models under a similar set of conditions. 
The present study not only extends the Wang and Wang (2001) finding to the GRM model, but 
also makes some comparisons between the two models. It should be noted that the error indices 
under the two models cannot be compared in a strict sense because their trait scales are slightly 
different. Thus, the two models can only be compared in a general sense. For example, they can 
be examined if the relative precision of the ability estimation methods is consistent across the 
two models. The comparison may also provide some guidelines to practitioners about which 
model they should use when implementing CAT. 

Objectives 

The purposes of this paper are; (a) Evaluate the relative precision (bias, SE, RMSE and 
others) of four ability estimation methods: Warm’s weighted likelihood estimate (WLE), 
maximum likelihood estimate (MLE), expected a posterior estimate (EAP), and the maximum a 
posterior estimate under two polytomous models in CAT; and (b) Compare the ability 
estimations of two polytomous models: the generalized partial credit model (GPCM) and the 
graded response model (GRM) under various computerized adaptive testing (CAT) conditions. 

Method and Data 

A Monte Carlo simulation method was used to evaluate the ability estimation 
methods used by the two polytomous models in this study. Both real item bank consisting of 263 
polytomously scored 1996 NEAP science items (Allen, Carlson, & Zelenak, 1999) and a 
simulated item bank were used for this study. The item bank was originally calibrated using the 
GPCM model. To construct the item bank using the GRM model, item responses for the entire 
item bank were generated for a large sample of simulees from a normally distributed population. 
The response data were then calibrated using the GRM model using PARSCALE. Three items 
were deleted from the calibration process due to poor fit, thus reducing the bank size to 260 
items for the GRM model. These item parameters are treated as true item parameters in the 
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simulation study. The items in the two smaller banks are randomly drawn from the larger bank 
containing 260 items. Tables 1 and 2 show the descriptive statistics for the item parameter 
estimates of three item banks under the generalized partial credit model and graded response 
model. The simulations were conditioned at 21 true ability values ranging from -4.0 to 4.0 by 
increments of 0.4 for both the GPCM and GRM. A CAT was simulated for 500 simulees at each 
of the 21 ability parameter points. A maximum-information item selection procedure was used. 
Effects of independent variables, size of item banks (260, 66, and 33), test termination rules 
(fixed test length and fixed test reliability), estimation methods (WLE, MLE, EAP, and MAP), 
and polytomous IRT models (GRM and GPCM) were examined by using both descriptive and 
inferential procedures. The dependent variables were bias, standard error (SE), root mean square 
error (RMSE), fidelity (correlation of estimated and true ability parameters), and administrative 
efficiency (the mean numbers of items needed to reach a criterion SE level). 

Conditional Error Indexes: 

A N A 



Bias(0) = X(0^-0)’ 



r=l 



SE(0) = 



A 
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where 0 is the true ability of simulees, which was used to generate responses in the simulation, 

A ♦ ♦ ♦ 

0^ is the estimated ability for the rth replication, and N is the number of replications. The 
number of replications in this MC study is the analogue of sample size. Because the primary 
goal is to assess the relative accuracy of the ability estimation methods, the significance of a 
statistic is tested and the empirical sampling distributions for the statistics are generated. In 
order to minimize the sample variance and increase the power to detect the effects of interest, a 
large number of replications are desired. In this study, relative accuracy is assessed by 
comparing the differences between the ability parameter estimates and the true ability across 
replications. In such a study, 500 replications are considered sufficient (Stone, 1993). The 
RMSE can be separated into two components. Bias and SE (RMSE = Bias + SE ). 

Overall Error Indexes: 

AVERAGEeias = XI Bias(0) | |0j * weight(0j ), 

i=I 

AVERAGEse = SE'(0)|0i * weight(Oi), 

AVERAGErmse= XRMSE(0)|0i * weightCO^ ), 

i=l 

where the weight(Oi) are quadrature weights based on the standard normal distribution, and the 0i 
are the 21 equally spaced true ability levels that range from -4 to 4 in increments of 0.4. 

Four experimental designs were used in the analyses of the overall indices. For the fixed- 
length tests, 4 0 estimation methods x 3 bank sizes x 4 test lengths and 4 0 estimation methods x 
4 test lengths x 2 models completely crossed analysis of variance (ANOVA) designs were used. 
For the fixed reliability tests, a 4 0 estimation methods x 3 bank sizes x reliability levels and a 4 
0 estimation methods x 3 reliability levels x 2 models completely crossed analysis of variance 
(ANOVA) designs were used. 
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Results 

Conditional Indices 

Figures 1 through 3 show the bias, SE, and RMSE of four ability estimates of fixed test 
length of 10 items under both models. It can be seen that the WLE has the smallest absolute bias 
and less SE over almost the entire ability range among all the methods for both GRM and 
GPCM. Both WLE and MLE have considerably less bias than the two Bayesian methods for 
both models. Both models have approximately the same precision patterns along almost all 
ability levels for both fixed test length CATs, although they are not strictly comparable. 



Insert Figures 1 to 3 about here 



Figures 4 through 6 show the bias, SE, and RMSE of four ability estimates of fixed test 
reliability for 0.9 under both models. First, for both models, WLE and MLE have remarkably 
smaller bias than EAP and MAP, especially at both extreme ability levels. Second, for both 
models, all methods show the same amount of SE. And last, for both models, WLE and MLE 
have smaller RMSE than EAP and MAP. In general, there is no large difference in bias, SE, and 
RMSE between GPCM and GRM. 



Insert Figures 4 to 6 about here 



In general, the results of the graded response model agreed with those for the generalized 
partial credit model (Wang & Wang, 2001). 
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Overall Indices 

Table 3 summarizes the results of the three-way ANOVA of absolute bias, SE, and RMSE 
(averaged across 0 levels) for the fixed test length and fixed reliability termination conditions 
under the graded response model. In general, the results for the overall indices further support 
the results of conditional indices for both models. For the GRM, 0 estimation methods and the 
fixed test length termination rule accounted for 27.5% and 29.3% of the total variance of 
absolute bias and had the largest influence on absolute bias. In comparison, for the GPCM, the 0 
estimation methods had the largest influence on absolute bias (Wang & Wang, 2001). 0 
estimation methods for the fixed reliability termination conditions under the GRM had the 
largest influence on absolute bias, accounting for 80.5% of the total variance of absolute bias. 
This result matches the result of the GPCM. Like the GPCM, the fixed test length termination 
rule and fixed test reliability termination rule under the GRM had the largest influences on 
RMSE, accounting for 51.1% and 90.9% of total variance of RMSE. 

Table 4 provides the results of the three-way ANOVA of absolute bias, SE, and RMSE 
(averaged across 0 levels) for the fixed test length termination and fixed reliability condition 
under both models. Instead of testing the effect of bank size, the model’s effect as one of the 
three factors (method, test length, and model), was tested. 

For fixed test length termination conditions, all main effects of method, test length, and 
model on absolute bias, SE, and RMSE were statistically significant. Although, the model factor 
only accounted for 4% and 6.9% of the total variances of bias and SE, it accounted for 31.2% of 
the total variance of RMSE. All interaction effects for bias, SE, and RMSE are not statistically 
significant at the 0.01 level except for interaction between method and test length for SE and 
RMSE. 0 estimation methods had the greatest influence on absolute bias, accounting for 31.8% 
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of the total variance of absolute bias; test length had the greatest influence on SE, accounting for 



51.5% of the total variance of SE. 

For the fixed test reliability termination condition, all of the main effects of method, test 
reliability, and model on absolute bias, SE, and RMSE were statistically significant at the 0.01 
level except for the effect of model on SE. For bias, the three-factor interaction was not 
significant and all three two-factor interactions were significant. For SE and RMSE, all two- 
factor and three-factor interactions were not statistically significant. Again, 0 estimation methods 
had the greatest influence on absolute bias, accounting for 76.7% of the total variance of absolute 
bias; test reliability had the greatest influence on SE and RMSE, accounting for 54.7% of the 
total variance of SE, and for 88.5% of the total variance of RMSE. 



Summary and Discussion 

This study examined the relative precision of four ability estimation methods (WLE, MLE, 
EAP, and MAP) under two polytomous models (GPCM and GRM) in the CAT environment, and 
comparisons of relative precision between GCPM and GRM were provided. In general, for all 
four 0 estimation methods, conditional and overall bias, SE, and RMSE are decreased as the test 
length, test reliability, and item bank size increased. The magnitudes of the differences among 
the dependent variables decreased as the values of independent variables increased. For both 
models, WLE outperformed MLE in terms of all the dependent variables studied, and WLE 
performed better than the Bayesian methods in terms of bias. The MLE had less bias than both 
Bayesian methods. Both EAP and MAP showed more favorable results with SE and fidelity than 
did either the WLE or MLE; EAP performed better than MAP for almost all conditions. 

Different test termination rules had significant impact on those dependent variables for given 
ability estimation methods, especially for the WLE and MLE methods. Although the quality of 
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item banks has vast effects on the conditional distribution of bias, SE, RMSE, and test efficiency 
(Wang & Vispoel, 1998), the item bank size had less impact on the differences among the 
dependent variables than did the test termination rules. This study confirms Warm’s conclusions 
that (a) WLE is unbiased to first order for fixed test length termination, while MLE, EAP, and 
MAP are biased, and (b) the WLE method has small variance over the entire range of 0 for fixed 
test length CAT testing. 

In general, for the fixed test length, for both GPCM and GRM models, the estimation method 
and test length had the same impact on bias, SE, and RMSE. But, the model factor had the 
largest impact on RMSE, accounting for 31.2% of the total variance of RMSE under GRM. For 
the fixed test reliability, the model factor had almost no influence on bias, SE, or RMSE under 
GRM. 

As CAT with polytomous models can be applied to a variety of polytomously scored items, 
and can be implemented in more and more testing programs, the search for a sound ability 
estimation method with a particular polytomous IRT model becomes increasingly important. 
MLE has been widely used in many CAT programs due to its having less bias. The present 
study shows that under both GRM and GPCM, for the fixed test length rule, WLE not only 
reduced the bias of MLE to almost zero, but reduced its SE as well. As computer scoring for 
polytomously scored items becomes more of a reality, the results of this study will have greater 
practical significance. 
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Figure 1. Bias curves of the ability estimation methods of two models, 
test length = 10, bank sizes = 263(260) 
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Figure 2. SE curves of the ability estimation methods of two models, 
test length = 10, bank size = 263(260) 
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Figure 3. RMSE curves of the ability estimation methods of two models, 
test length = 10, bank size = 263(260) 
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Figure 4. Bias curves of the ability estimation methods of two models. 
Reliability = 0.9, bank size = 263(260) 
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Figure 5. SE curves of the ability estimation methods, 
reliability = 0.9, bank size = 263(260) 




Figure 6. RMSE curves of the ability estimation methods of two models, 
reliability = 0.9, bank size = 263(260). 
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Table 1 

Descriptive Statistics for the Estimates of Item Parameters of the Three Item Banks, 
IGPCM, 2GPCM, and 3GPCM, under the Generalized Partial Credit Model 



Bank/ 

Parameter 


No. 

Items 


Mean 


Median 


S.D. 


Minimum 


Maximum 


IGPCM 


263 












a 




0.549 


0.522 


0.229 


0.105 


1.871 


b, 




0.713 


0.720 


2.011 


-6.972 


11.746 


b2 




1.270 


1.264 


2.640 


-17.381 


13.926 


b3 




1.034 


1.004 


2.371 


-6.369 


7.187 


b4 




0.822 


0.822 


2.546 


-3.159 


4.924 


2GPCM 


66 












a 




0.539 


0.527 


0.171 


0.171 


1.200 


b, 




1.066 


1.000 


1.728 


-3.204 


7.399 


b2 




1.679 


1.491 


2.519 


-2.665 


13.926 


b3 




1.832 


1.412 


1.656 


-0.856 


5.506 


b4 




4.270 


4.270 


0.535 


0.535 


4.925 


3GPCM 


33 












a 




0.560 


0.523 


0.190 


1.90 


1.055 


bi 




0.752 


0.631 


1.384 


-2.738 


3.437 


b2 




1.695 


1.684 


2.495 


-3.638 


7.293 


b3 




1.467 


1.680 


3.480 


-6.369 


7.187 


b4 




2.000 


2.000 


0.000 


2.000 


2.000 
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Table 2 

Descriptive Statistics for the Estimates of Item Parameters of the Three Item Banks 
IGRM, 2GRM, and 3GRM, under the Graded Response Model 



Bank/ 

Parameter 


No. 

Items 


Mean 


Median 


S.D. 


Minimum 


Maximum 


IGPCM 


260 












a 




0.658 


0.668 


0.347 


0.180 


2.206 


b, 




-0.889 


-0.568 


2.066 


-20.105 


3.066 


b2 




1.496 


1.163 


2.245 


-9.962 


10.627 


b3 




1.837 


1.941 


3.475 


-17.578 


12.767 


b4 




2.033 


2.096 


1.500 


-0.158 


4.312 


2GPCM 


66 












a 




0.620 


0.647 


0.273 


0.074 


1.098 


b, 




-0.834 


-0.620 


1.385 


-5.565 


3.066 


hi 




1.590 


1.108 


2.291 


-3.079 


8.627 


bs 




2.184 


2.140 


1.229 


0.600 


4.312 


b4 




3.072 


3.072 


0.000 


3.072 


3.072 


3GPCM 


33 












a 




0.678 


0.693 


0.333 


0.065 


1.301 


b, 




-0.980 


0.803 


1.594 


-6.853 


2.688 


hi 




1.374 


1.125 


1.683 


-1.390 


5.446 


b3 




1.703 


1.164 


1.639 


0.304 


5.231 


b4 




2.096 


2.096 


0.000 


2.096 


2.096 
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Table 3 

Results of ANOVA with Fixed Test Length and Fixed Test Reliability Termination Rules for the GRM 
















ON 








r- 


O 


WO 






O 




ON 




CO 




o 


00 










o 


o 






H 


VO 


1—4 


CN 






CN 




o 




o 


o 




O 










CN 


uo 




o 


O 


o 


o 






O 


o 


ON 




O 


o 


o 


o 


w 








o 


o 


o 




o 


o 


o 


o 






d 


d 


d 




d 


d 


d 


d 


§ 








o 


o 


o 






o 


VO 


o 






o 


o 


o 






o 


o 












o 


o 


o 




ro 


o 


ON 


1-H 






o 


o 


o 




■^—4 


o 


o 


o 


c 


ex. 






o 


o 


o 






o 


CN 








o 


o 


o 




o 


o 


o 


o 


<u 








o 


o 


o 




O 


o 


O 


o 






d 


d 


d 




d 


d 


d 


d 


OJQ 












































c3 
























































































<U 










ON 


CN 






ON 


WO 


00 






VO 


wo 


ON 




CN 


ON 




CN 


> 








o 


VO 


uo 




ro 


t-H 


wo 


VO 






VO 


CN 








wo 


1-H 


WO 


< 








vq 




CN 




r-; 


vq 


CN 


cq 






vq 


WO 








CN 




r-; 




k, 






uo 


ON 


Os 






wo 










vd 








cd 


vd 


cd 


cd 










CN 




CN 
















CO 


O 


wo 








1—1 
















.-H 


















1-H 


wo 












































CN 




















ON 


CN 










wo 








CN 


o 






ON 


00 


ON 












00 


VO 






CN 


o 


o 


o 






ON 




1— H 




1-H 


CO 


o 


wo 












uo 




o 




O 


o 






o 




WO 




o 


o 




o 


W 

00 

<u 

cuo 

C3 








o 


o 


o 




o 


o 


O 


o 






d 


d 


d 




d 


d 


d 


d 








o 


o 


o 




o 


o 


O 


o 






o 


o 


o 




o 


o 


o 


o 








o 


o 


o 




o 


o 


O 


o 






o 


o 


o 




o 


o 


o 


o 


ex. 






o 


o 


o 




o 


o 


o 


o 






o 


o 


o 




o 


o 


o 


o 


Uh 

<U 








o 


o 


d> 




o 


o 


o 


d 






d 


d 


d 




d 


d 


d 


d 


> 












































< 












uo 




o 


CN 


as 








o 


00 


00 




00 


00 


1 


1 














CN 




o 


o 








o 


00 


CO 




00 


CO 


CO 


CO 










uo 


vq 






00 




00 










ON 


o 








r- 


r- 




k, 






o\ 


r^‘ 






CN 


CN 


wo 


b 






b 


ON 


d 




00 


vd 


vd 


00 










o\ 


VO 


CN 








1— ( 










VO 






CN 


wo 




CO 










CN 


VO 


uo 






CN 










CN 




CO 








CN 
















cn 




















CN 




















uo 


uo 






VO 




VO 








wo 




VO 






CN 




CO 




"b 










Ov 






m 










o 


o 


ON 




o 


00 


O 


o 








CN 


o 


CN 




o 


o 


o 


o 






00 


o 


O 




o 


o 


o 


o 


C/3 

2 

PQ 








o 


o 


O 




o 


o 


o 


d 






d 


d 


d 




d 


d 


d 


d 








o 


o 


o 






CN 




VO 






o 


o 


o 




o 


o 


o 


o 


<u 








o 


o 


o 




s 




S 








o 


o 


o 




o 


o 


o 


o 




ex. 






o 


o 


o 




CN 


CN 






o 


o 


o 




o 


o 


o 


o 


'o 








o 


o 


o 




o 


o 


o 


d 






d 


d 


d 




d 


d 


d 


d 














































< 










ro 






00 


, 


00 


wo 






00 


CN 






CN 


CN 


o 


ON 










CN 


ro 






O 


00 


VO 








00 


CN 






ON 


CN 




wo 










vq 


r-; 


wo 






CN 


rq 


CN 






vq 


cq 


wo 




CO 


cq 




cq 




kn 






00 




o 




CN 




CN 










cn 






ON 


wo 


vd 












CN 




ro 


















CO 


VO 






1-H 


1-H 
































CN 










CN 




















































DF 






m 


CN 


ro 




VO 


Ov 


VO 


00 


00 




cn 


(S 


(N 




VO 


VO 




CM 




























J-G 
















































,/ V 


,/ S 
















W) 






00 
















.'I 


















O 

(L> 


c 


00 

•w 

o 

W 


s 


<u 

N 


1 

•w 

b£) 

G 


0/3 

G 








hJ 




13 

Pi 


c/3 5 

o & 

^ o 
W an 


<1> 

N 




C/3 

G 








Pi 


Pi 


w 


•w 

<u 


T3 

o 




hJ 


O 

•w 

O 


00 


hJ 


SxL 


X 

00 




•w 

c/3 


(^5 

-X 


1 


O 

*-4— > 

o 


CO 


cxi 


SxR 


X 

00 






H 

T3 

<U 

X 


G 

*3 


•4-^ 

<u 

s 


G 

PQ 


to 

<U 

H 


d> 

5 


X 

s 


X 

s 


X 

s 


Error 


H 

T3 

(U 

X 


•w 

c u 

■s s 
s 


G 

c3 

PQ 


13 

Pi 


C3 

•w 

5 


X 

s 


X 

s 


X 

s 



£ b 



O 

ERIC 



20 



Table 4 

Results of ANOVA with Fixed Test Length and Fixed Test Reliability Termination Rules for the GRM and GPCM 
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